Serbo-croatian Hyphenation: a 'ijex Point of View

نویسنده

  • Cvetana Krstev
چکیده

Serbo-Croatian is one of the South-Slavic languages. It is characterized, as other Slavic languages, by a rich morphology. A particular feature of the language is its almost fully phonological orthography, i.e. on a word level, one letter corresponds to each phoneme and vice versa. As a result, the written text practically represents a phonemic transcription of speech. Still, the Serbo-Croatian literary language has two main pronunciations, ekavian and jekavian, which reflect the different development of the pronunciation of the old Slavic sound h. Sound h is usually replaced by vowel e in ekavian dialect (for instance, dete, mleko, veEan, ~ o v e k ) while in jekavian dialect it is usually replaced either by two-syllable group i j e (d i j e t e , mlijeko) or by one-syllable group j e ( v j e ~ a n , Eovjek). Those differences in pronunciation are recorded in the written text. Accent has a distinctive role in SerboCroatian and as it is not marked in written texts there is a number of homographs. Two alphabets are in use: Latin and Cyrillic. The Serbo-Croatian Latin alphabet is different from the English alphabet. Both letters with diacritics E, E, Z, g, d-and digraphs-d~, l j . n j -are in use and they all have a separate place in the alphabet. The order of the Serbo-Croatian Latin alphabet is therefore as follows: a, b. c. E. t, d, dZ, d, e and so on. As the letters q, w, x and y don't exist in the Serbo-Croatian alphabet, the total number of letters is 30. Transcription of foreign words and names is compulsory in SerboCroatian of ekavian pronunciation while jekavian pronunciation allows the orthography of the source language. While all the letters with diacritics are assigned separate keys on the standardized national keyboard as well as the positions in the national version of 7-bit code [I, 2, 31, neither keys nor codes are provided for digraphs so they are input by striking two keys, i.e. by entering two codes. Besides that, although the standard provides a separate key for the letter d. the keyboards of old typewriters often did not have it. As a result, this letter was-and sometimes still is -recorded as the digraph d j . in spite of orthographic rules. Serbo-Croatian Cyrillic has the equivalent 30 letters but with neither diacritics nor digraphs. The order of the letters in the Serbo-Croatian Cyrillic alphabet is completely different from the order in the Latin alphabet. The Serbo-Croatian Cyrillic alphabet is also different from the Russian alphabet as there are letters which do not exist in Russian Cyrillic: 5, j , JL, I+, h, u, and vice versa, which is important as the Russian Cyrillic was the basis for the development of appropriate international coding standards. The digraphs of the Serbo-Croatian Latin alphabet can cause problems when using formatting and typesetting programs. particularly for hyphenation and automatic transcription from the Latin to the Cyrillic alphabet. These problems can be caused by each combination-lj, n j , dZ and dj-which in the text may represent both digraphs and consonant clusters. A digraph is always transcribed into one Cyrillic letter and is never hyphenated. For instance, nadZak-baba is transcribed into ~avax-6aGa and in both cases is hyphenated as na-dZak-ba-ba. On the other hand, a consonant cluster is always transcribed into two Cyrillic letters and can, in principle, be hyphenated. For instance, nadZiveti is transcribed into Ha,qxmem and is hyphenated as nad-Zi-ve-ti.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A model of the perception of Serbo-Croatian word tone

Purcell, 1979 presented data on the perception of Serbo-Croatian word tone by native speakers. The present paper develops a logistic regression model of the perception of Serbo-Croatian word tone using Purcell’s 1979 data. Two models are developed: an overall model and a two-part, split model. Model fits are calculated and plotted. The two-part model fits the perceptual data better. Model coeff...

متن کامل

Visual Word Recognition in Serbo-croatian Is Necessarily Phonological

In a naming task conducted with bi-alphabetic readers of Serbo-Croatian. it was shown that letter strings that can be assigned both a Roman and a Cyrillic alphabet reading incur longer latencies than the unique alphabet transcription of the same word. and that the magnitude of the difference depended on the number of ambiguous characters in the ambiguous letter string. While this wi thin-word p...

متن کامل

Automatic Prosody Generation for Serbo-Croatian Speech Synthesis Based on Regression Trees

The paper presents the module for automatic generation of prosodic features of synthesized speech, namely, f0 targets and phonetic segment durations, within the speech synthesizer AlfaNumTTS, the most sophisticated speech synthesis system for Serbo-Croatian language to date. The module is based on regression trees trained on a studio recorded single speaker database of Serbo-Croatian. The datab...

متن کامل

Transcribing Multilingual Broadcast News Using Hypothesis Driven Lexical Adaptation

This paper describes first results of our DARPA-sponsored efforts toward recognizing and browsing foreign language, more specifically, Serbo-Croatian broadcast news. For Serbo-Croatian as well as many other than the most common well studied languages, the problems of broadcast quality recognition are complicated by 1.) the lack of available acoustic and language data, and 2.) the excessive voca...

متن کامل

Strategies for visual word recognition and orthographical depth: a multilingual comparison.

We investigated the psychological reality of the concept of orthographical depth and its influence on visual word recognition by examining naming performance in Hebrew, English, and Serbo-Croatian. We ran three sets of experiments in which we used native speakers and identical experimental methods in each language. Experiment 1 revealed that the lexical status of the stimulus (high-frequency wo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011